Linear Discriminant Analysis vs. Principal Component Analysis: Which Method is Better for Dimensionality Reduction?

January 12, 2022

Introduction

In machine learning, it's common to have a dataset with a large number of features. While having a large number of features can be helpful for building accurate models, it can also lead to overfitting and slow model training time. Dimensionality reduction is a technique that can help solve these issues by reducing the number of features in a dataset while preserving the important information. Two commonly used dimensionality reduction techniques are Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA). In this blog post, we will look at the differences between these two techniques and help you decide which one is better for your needs.

Linear Discriminant Analysis (LDA)

LDA is a supervised learning technique that is commonly used for classification problems. The goal of LDA is to find a linear combination of features that maximizes the separation between classes.

Here are some advantages of LDA:

LDA can handle multiple classes with a small number of samples for each class.
LDA can be used for feature extraction as well as dimensionality reduction.
LDA assumes that the data follows a normal distribution.

Here are some disadvantages of LDA:

LDA is sensitive to outliers in the data.
LDA assumes that the classes have equal covariance matrices, which may not always be the case.

Principal Component Analysis (PCA)

PCA is an unsupervised learning technique that is commonly used for feature extraction and dimensionality reduction. The goal of PCA is to find a lower-dimensional representation of the data that preserves as much of the variance as possible.

Here are some advantages of PCA:

PCA is computationally efficient and can handle a large number of samples and features.
PCA can be used for feature extraction as well as dimensionality reduction.
PCA does not assume any specific distribution for the data.

Here are some disadvantages of PCA:

PCA does not take into account the classes or labels of the data.
PCA may not always be the best choice for classification problems.

Comparison of LDA and PCA

Now that we have looked at the advantages and disadvantages of LDA and PCA, let's compare them side by side:

	LDA	PCA
Supervision	Supervised	Unsupervised
Goal	Maximize separation between classes	Maximize variance
Best for	Classification problems	Feature extraction and dimensionality reduction
Assumptions	Normal distribution, equal covariance matrices	None
Sensitivity to	Outliers in the data	None
Variance Preservation	May not preserve as much variance as PCA	Tries to preserve as much variance as possible
Computationally	May be slower than PCA depending on the number of classes	Faster than LDA

Conclusion

Both LDA and PCA are useful techniques for dimensionality reduction. LDA is best suited for classification problems and handles multiple classes with a small number of samples for each class. PCA is best suited for feature extraction and dimensionality reduction, and can handle a large number of samples and features.

References:

Hastie T, Tibshirani R, and Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.
Liu L, Zeng X, and Gao X. "Principal component analysis and linear discriminant analysis for feature extraction." Journal of Acoustical Society of America, 139(6):2886-2895, 2016.